class: center, middle, inverse, title-slide .title[ # ISA 444: Business Forecasting ] .subtitle[ ## 04: Time Series EDA ] .author[ ###
Fadel M. Megahed, PhD
Endres Associate Professor
Farmer School of Business
Miami University
@FadelMegahed
fmegahed
fmegahed@miamioh.edu
Automated Scheduler for Office Hours
] .date[ ### Fall 2023 ] --- # Quick Refresher from Last Class ✅ Read CSV files in both
and
. ✅ Construct line charts and seasonality plots in both
and
. ✅ Utilize the project workflow in
and create
and
scripts. ❌ Access, subset, and create `ts()` objects in
. --- # Learning Objectives for Today's Class - Examine the goals of utilizing line charts in time-series analysis (i.e., detect trends, seasonality, and cycles). - Develop a deeper understanding of the grammar of graphics, which we used to create time series plots in
and
. - Use numerical summaries to describe a time series. - Explain what do we mean by correlation. --- class: inverse, center, middle # A Taxonomy of Time Series Plots and their Interpretation --- # A Structured Approach for Time Series Viz <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../../figures/ts_plots.png" alt="A Potential Framework for Time Series Visualization" width="100%" /> <p class="caption">A Potential Framework for Time Series Visualization</p> </div> .footnote[ <html> <hr> </html> This is my best attempt to improve on the general advice provided in the previous slide. Many of the suggestions, presented in this flow chart, stem from my past and current research/consulting collaborations. They are by no means a comprehensive list of everything that you can do. ] --- # The Line Chart .pull-left[ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#04_ts_eda_files/figure-html/cincyplot1-1.png" alt="A plot of a long time-series for monthly weather in Cincinnati, with color denoting different months." width="100%" /> <p class="caption">A plot of a long time-series for monthly weather in Cincinnati, with color denoting different months.</p> </div> ] .pull-right[ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#04_ts_eda_files/figure-html/cincyplot2-1.png" alt="A snippet of the time-series (last 5 full years) for monthly weather in Cincinnati, with color denoting different months." width="100%" /> <p class="caption">A snippet of the time-series (last 5 full years) for monthly weather in Cincinnati, with color denoting different months.</p> </div> ] --- # The Line Chart: Practical Considerations .can-edit.key-activity0_[ **Things to Consider:** .font70[(Insert below)] - **Format your data:** .... - **Entire time-series vs a snippet:** .... - **On the use of color:** .... - **On grouping the data:** .... ] --- # On the Interpretation of Line Charts
−
+
05
:
00
.panelset[ .panel[.panel-name[Activity] > Over the next 5 minutes, please identify what you have learned from the charts in each tab. - Write down your answers in the last tab (it is editable). - Discuss your answers with your neighboring classmates. - Be prepared to share these answers with class. ] .panel[.panel-name[Book Stores] <center> <iframe src="https://fred.stlouisfed.org/graph/graph-landing.php?g=ZopX&width=800&height=420" scrolling="no" frameborder="0" style="overflow:hidden; width:800px; height:500px;" allowTransparency="true" loading="lazy"></iframe> </center> ] .panel[.panel-name[GDP 1] <center> <iframe src="https://fred.stlouisfed.org/graph/graph-landing.php?g=ZorB&width=800&height=420" scrolling="no" frameborder="0" style="overflow:hidden; width:800px; height:500px;" allowTransparency="true" loading="lazy"></iframe> </center> ] .panel[.panel-name[GDP 2] <center> <iframe src="https://fred.stlouisfed.org/graph/graph-landing.php?g=ZorL&width=800&height=420" scrolling="no" frameborder="0" style="overflow:hidden; width:800px; height:500px;" allowTransparency="true" loading="lazy"></iframe> </center> ] .panel[.panel-name[Key Points] .can-edit.key-activity0b_viz[ **Main Insight(s):** .font70[(Insert below)] - **Book Stores:** Trend: ... | Seasonality: ... | Cycle: ... - **GDP 1:** Trend: ... | Seasonality: ... | Cycle: ... - **GDP 2:** Trend: ... | Seasonality: ... | Cycle: ... ] ] ] --- # Need Assistance with Trends <img src="data:image/png;base64,#04_ts_eda_files/figure-html/trends-1.png" width="70%" style="display: block; margin: auto;" /> --- # Need Assistance with Seasonality In [Section 2.2.1 of our reference book](https://wessexlearning.com/products/principles-of-business-forecasting-2nd-ed-part-i), the authors presented two approaches for considering seasonality. We can replicate them easily in
and
. Refer to the discussion in the next section for more detail. --- class: inverse, center, middle # The Grammar of Graphics and the `ggplot2` package --- # A Visual Introduction to Graph Layers <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#https://r.qcbs.ca/workshop03/book-en/images/Layers_ggplot.png" alt="The ggplot2 and plotnine packages are based on the Grammar of Graphics (GG), which is a framework for data visualization that dissects each component of a graph into individual components, creating distinct layers. Using the GG system, we can build graphs step-by-step for flexible, customizable results." width="100%" /> <p class="caption">Schematic of some distinct layers in the Grammar of Graphics.</p> </div> .footnote[ <html> <hr> </html> Figure from the QCBS R Workshop Series. It is from [Chapter 05 of their third workshop](https://r.qcbs.ca/workshop03/book-en/grammar-of-graphics-gg-basics.html). ] --- # The Grammar of Graphics <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#https://r.qcbs.ca/workshop03/book-en/images/gglayers.png" alt="An overview of the layers highlighted in the Grammar of Graphics." width="57%" /> <p class="caption">An overview of the layers introduced in the Grammar of Graphics.</p> </div> .footnote[ <html> <hr> </html> Figure from the QCBS R Workshop Series. It is from [Chapter 05 of their third workshop](https://r.qcbs.ca/workshop03/book-en/grammar-of-graphics-gg-basics.html). ] --- # Grammar of Graphics Layers: Data - Data needs to be in a **tidy** format (see next two slides). - The [dplyr](https://dplyr.tidyverse.org/reference/dplyr-package.html) and [tidyr](https://tidyr.tidyverse.org/)
can help with `tidying` your data. --- background-image: url(data:image/png;base64,#https://github.com/allisonhorst/stats-illustrations/raw/main/rstats-artwork/tidydata_1.jpg) background-size: 95% 95% .footnote[ **Source:** Illustration is from the Openscapes blog [Tidy Data for reproducibility, efficiency, and collaboration](https://www.openscapes.org/blog/2020/10/12/tidy-data/) by Julia Lowndes and Allison Horst ] ??? * In database, this is schema. * Tidy data principles are a rephrase of third norm in a database schema design. <https://en.wikipedia.org/wiki/Third_normal_form>, to data scientists. * tidy data is for human consumption. * Tabular data is column-oriented format --- background-image: url(data:image/png;base64,#https://github.com/allisonhorst/stats-illustrations/raw/main/rstats-artwork/tidydata_2.jpg) background-size: contain .footnote[ **Source:** Illustration is from the Openscapes blog [Tidy Data for reproducibility, efficiency, and collaboration](https://www.openscapes.org/blog/2020/10/12/tidy-data/) by Julia Lowndes and Allison Horst ] --- # Grammar of Graphics Layers: Aesthetics **Aesthetics (`ggplot2::aes()`)** are used to make data visible. For example: - `x`, `y`: variable to be plotted along the x and y axes. - `color`: color of geoms (i.e., points, lines, etc) according to the data. - `fill`: the inside color of the geom (useful for bar charts). - `group`: what group a geom belongs to (useful in multiple ts). - `shape`: the shape of the plotted point (circle, triangle, filled circle, etc). - `linetype`: the type of line used (solid, dashed, etc). - `size`: size scaling for an extra dimension. - `alpha`: the transparency of the geom --- # Identify the Aesthetics Used in the Charts
−
+
06
:
00
.panelset[ .panel[.panel-name[Activity] > Over the next 6 minutes, please identify the aesthetics used in each chart. - Write down your answers in the right-side of each tab (it is editable). - You can discuss your answers with your neighboring classmates. - Be prepared to share these answers with class. ] .panel[.panel-name[Line Chart 1] .pull-left-2[ <img src="data:image/png;base64,#04_ts_eda_files/figure-html/retailsales_linechart1-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right-2[ .can-edit.key-activity1a_viz[ **Main Aesthetics:** .font70[(Insert below)] - `x`: .... and its class is: .... - `y`: .... and its class is: .... - `group`: ..... - `color`: ..... ] ] ] .panel[.panel-name[Line Chart 2] .pull-left-2[ <img src="data:image/png;base64,#04_ts_eda_files/figure-html/retailsales_linechart2-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right-2[ .can-edit.key-activity1b_viz[ **Main Aesthetics:** .font70[(Insert below)] - `x`: .... and its class is: .... - `y`: .... and its class is: .... - `group`: ..... - `color`: ..... ] ] ] .panel[.panel-name[Seasonal Chart 1] .pull-left-2[ <img src="data:image/png;base64,#04_ts_eda_files/figure-html/retailsales_linechart3-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right-2[ .can-edit.key-activity1c_viz[ **Main Aesthetics:** .font70[(Insert below)] - `x`: .... and its class is: .... - `y`: .... and its class is: .... - `group`: ..... - `color`: ..... ] ] ] .panel[.panel-name[Seasonal Chart 2] .pull-left-2[ <img src="data:image/png;base64,#04_ts_eda_files/figure-html/retailsales_linechart4-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right-2[ .can-edit.key-activity1d_viz[ **Main Aesthetics:** .font70[(Insert below)] - `x`: .... and its class is: .... - `y`: .... and its class is: .... - `group`: ..... - `color`: ..... ] ] ] ] --- # Grammar of Graphics Layers: Aesthetics - Assigned **globally** to the entire plot via `ggplot2::ggplot(ggplot2::aes())`, or to **specific geoms** (e.g., `ggplot2::geom_point(ggplot2::aes()).` .pull-left[ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#04_ts_eda_files/figure-html/global_aes-1.png" alt="The color is passed globally through ggplot(aes(x=date, y=price, color=year))." width="100%" /> <p class="caption">The color is passed globally through ggplot(aes(x=date, y=price, color=year)).</p> </div> ] .pull-right[ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#04_ts_eda_files/figure-html/local_aes-1.png" alt="The color is passed as an argument within the layer as geom_point(aes(color = year))." width="100%" /> <p class="caption">The color is passed as an argument within the layer as geom_point(aes(color = year)).</p> </div> ] --- # Grammar of Graphics Layers: Individual Geoms **Geometric objects** i.e., geoms help determine the type of plot. In this class, we will typically use one or more of the following *geoms*: - `ggplot2::geom_point()`: scatterplot or points in a line graph. - `ggplot2::geom_line()`: lines connecting points by increasing value of x. - `ggplot2::geom_smooth()`: to fit a function line (e.g., linear regression line) based on data. --- # Grammar of Graphics Layers: Facets We can use `ggplot2::facet_wrap()` to create small multiples based on a single variable. Arguments for `ggplot2::facet_wrap()` include: - `facets` which takes the variable of interest in quotes (i.e., `facets = 'symbol'`); - `nrow` and/or `ncol` which take numeric inputs for the number of rows and columns; and - `scales`, where we typically use `free_y` to denote that different `ylim` for each panel. ```r fang_df = tidyquant::tq_get( x = c('META', 'AMZN', 'NFLX', 'GOOG'), from = '2010-01-01', to = Sys.Date() ) colnames(fang_df) ``` ``` ## [1] "symbol" "date" "open" "high" "low" "close" "volume" ## [8] "adjusted" ``` --- count: false # Grammar of Graphics Layers: Facets We can use `ggplot2::facet_wrap()` to create small multiples based on a single variable. Arguments for `ggplot2::facet_wrap()` include: - `facets` which takes the variable of interest in quotes (i.e., `facets = 'symbol'`); - `nrow` and/or `ncol` which take numeric inputs for the number of rows and columns; and - `scales`, where we typically use `free_y` to denote that different `ylim` for each panel. .pull-left[ <img src="data:image/png;base64,#04_ts_eda_files/figure-html/faang_linechart-1.png" alt="A line chart of the FANG Closing Price Data." width="90%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="data:image/png;base64,#04_ts_eda_files/figure-html/faang_multiples-1.png" alt="A panel chart of the FANG Closing Price Data." width="90%" style="display: block; margin: auto;" /> ] --- # Grammar of Graphics Layers: Coordinates In class, we will use the following two functions to create **snapshots** of the data: - `ggplot2::coord_cartesian()` to set limits. We will specicially use its xlim argument to create a snapshot when the `\(x\)` axis contains a continous variable (e.g., year). See [ggplot2 documentation](https://ggplot2.tidyverse.org/reference/coord_cartesian.html) for more detail. - `tidyquant::coord_x_date()` to set limits. We will specicially use its xlim argument to create a snapshot when the `\(x\)` axis contains a date variable (with a `class` of date). See [tidyquant documentation](https://business-science.github.io/tidyquant/reference/coord_x_date.html) for more detail. --- # Grammar of Graphics: Themes Themes control the overall visual defaults. There are some themes built within the `ggplot2`
(see the [complete themes guide](https://ggplot2.tidyverse.org/reference/ggtheme.html)). For additional themes, please feel free to play with the [ggthemes](https://github.com/jrnold/ggthemes)
. .pull-left[ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#04_ts_eda_files/figure-html/fang_theme1-1.png" alt="Default ggplot2 theme" width="100%" /> <p class="caption">Default ggplot2 theme</p> </div> ] .pull-right[ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#04_ts_eda_files/figure-html/fang_theme2-1.png" alt="A modified theme (no gridlines, no gray background and caption is place below)." width="100%" /> <p class="caption">A modified theme (no gridlines, no gray background and caption is place below).</p> </div> ] --- class: inverse, center, middle # Putting it all together --- # A Singular TS: AAPL's Adj. Close ```r if(require(tidyquant)==F) install.packages("tidyquant") # install if needed if(require(tidyverse)==F) install.packages('tidyverse') # install if needed aapl = # get AAPL stock data from 1st trading day after Jan 1, 2020 to now tidyquant::tq_get(x = 'AAPL', from = '2020-01-01', to = Sys.Date() ) |> # select (i.e., keep) only the variables below dplyr::select( c(date, symbol, adjusted) ) |> # create the following variables: year and month dplyr::mutate( # date has to be of class Date if not use lubridate::ymd (mdy, dmy, etc) # to convert the string variable to date year = lubridate::year(date), month = lubridate::month(date, label = T) ) tail(aapl, n = 1) # print the last obs to see what we have ``` ``` ## # A tibble: 1 × 5 ## date symbol adjusted year month ## <date> <chr> <dbl> <dbl> <ord> ## 1 2023-09-05 AAPL 190. 2023 Sep ``` --- # A Singular TS: The GG Layers .left-code[ .small[ ```r *# layers are + in ggplot2 *ggplot2::ggplot(aapl) ``` ] ] .right-plot[ <img src="data:image/png;base64,#04_ts_eda_files/figure-html/aapl1_out-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count:false # A Singular TS: The GG Layers .left-code[ .small[ ```r # layers are + in ggplot2 ggplot2::ggplot( aapl, * ggplot2::aes(x = date, y = adjusted) ) ``` ] ] .right-plot[ <img src="data:image/png;base64,#04_ts_eda_files/figure-html/aapl2_out-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count:false # A Singular TS: The GG Layers .left-code[ .small[ ```r # layers are + in ggplot2 ggplot2::ggplot( aapl, ggplot2::aes(x = date, y = adjusted) ) + * ggplot2::geom_point() ``` ] ] .right-plot[ <img src="data:image/png;base64,#04_ts_eda_files/figure-html/aapl3_out-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count:false # A Singular TS: The GG Layers .left-code[ .small[ ```r # layers are + in ggplot2 ggplot2::ggplot( aapl, ggplot2::aes(x = date, y = adjusted) ) + ggplot2::geom_point( * ggplot2::aes(color = month) ) ``` ] ] .right-plot[ <img src="data:image/png;base64,#04_ts_eda_files/figure-html/aapl4_out-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count:false # A Singular TS: The GG Layers .left-code[ .small[ ```r # layers are + in ggplot2 ggplot2::ggplot( aapl, ggplot2::aes(x = date, y = adjusted) ) + ggplot2::geom_point( ggplot2::aes(color = month) ) + * ggplot2::geom_line() ``` ] ] .right-plot[ <img src="data:image/png;base64,#04_ts_eda_files/figure-html/aapl5_out-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count:false # A Singular TS: The GG Layers .left-code[ .small[ ```r # layers are + in ggplot2 ggplot2::ggplot( aapl, ggplot2::aes(x = date, y = adjusted) ) + ggplot2::geom_point( ggplot2::aes(color = month) ) + ggplot2::geom_line() + * ggplot2::geom_smooth( * method = lm, formula = 'y ~ x' * ) ``` ] ] .right-plot[ <img src="data:image/png;base64,#04_ts_eda_files/figure-html/aapl6_out-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count:false # A Singular TS: The GG Layers .left-code[ .small[ ```r # layers are + in ggplot2 ggplot2::ggplot( aapl, ggplot2::aes(x = date, y = adjusted) ) + ggplot2::geom_point( ggplot2::aes(color = month) ) + ggplot2::geom_line() + ggplot2::geom_smooth( method = lm, formula = 'y ~ x' ) + * ggplot2::scale_x_date( * breaks = scales::pretty_breaks(n=8) * ) + * ggplot2::scale_y_continuous( * breaks = scales::pretty_breaks(n=6), * labels = scales::dollar) ``` ] ] .right-plot[ <img src="data:image/png;base64,#04_ts_eda_files/figure-html/aapl7_out-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count:false # A Singular TS: The GG Layers .left-code[ .small[ ```r # layers are + in ggplot2 ggplot2::ggplot( aapl, ggplot2::aes(x = date, y = adjusted) ) + ggplot2::geom_point( ggplot2::aes(color = month) ) + ggplot2::geom_line() + ggplot2::geom_smooth( method = lm, formula = 'y ~ x' ) + ggplot2::scale_x_date( breaks = scales::pretty_breaks(n=8) ) + ggplot2::scale_y_continuous( breaks = scales::pretty_breaks(n=6), labels = scales::dollar) + * tidyquant::coord_x_date( * xlim = c('2023-06-01', '2023-09-30') * ) ``` ] ] .right-plot[ <img src="data:image/png;base64,#04_ts_eda_files/figure-html/aapl8_out-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count:false # A Singular TS: The GG Layers .left-code[ .small[ ```r # layers are + in ggplot2 ggplot2::ggplot( aapl, ggplot2::aes(x = date, y = adjusted) ) + ggplot2::geom_point( ggplot2::aes(color = month) ) + ggplot2::geom_line() + ggplot2::geom_smooth( method = lm, formula = 'y ~ x' ) + ggplot2::scale_x_date( breaks = scales::pretty_breaks(n=8) ) + ggplot2::scale_y_continuous( breaks = scales::pretty_breaks(n=6), labels = scales::dollar) + tidyquant::coord_x_date( xlim = c('2023-06-01', '2023-09-30') ) + * ggplot2::theme_bw(base_size = 14) + * ggplot2::theme( * legend.position = 'bottom' * ) ``` ] ] .right-plot[ <img src="data:image/png;base64,#04_ts_eda_files/figure-html/aapl9_out-1.png" width="100%" style="display: block; margin: auto;" /> ] --- count:false # A Singular TS: The GG Layers .left-code[ .small[ ```r # layers are + in ggplot2 ggplot2::ggplot( aapl, ggplot2::aes(x = date, y = adjusted) ) + ggplot2::geom_point( ggplot2::aes(color = month) ) + ggplot2::geom_line() + ggplot2::geom_smooth( method = lm, formula = 'y ~ x' ) + ggplot2::scale_x_date( breaks = scales::pretty_breaks(n=8) ) + ggplot2::scale_y_continuous( breaks = scales::pretty_breaks(n=6), labels = scales::dollar) + tidyquant::coord_x_date( xlim = c('2023-06-01', '2023-09-30') ) + ggplot2::theme_bw(base_size = 14) + ggplot2::theme( legend.position = 'bottom' * ) -> aapl_plot *plotly::ggplotly(p = aapl_plot) ``` ] ] .right-plot[
] --- # A Singular TS:
Implementation .left-code[ .small[ ```python import yfinance as yf import datetime as dt from plotnine import ggplot, aes, theme, theme_bw, geom_line, geom_point, geom_smooth # extracting the data using yfinance library (install via pip in console) aapl = yf.download(tickers=['AAPL'], start=dt.datetime(2021, 1, 1), end=dt.datetime(2023, 9, 6)) # creating features aapl.reset_index(inplace = True) # convert the index into a Date column aapl['month'] = aapl['Date'].dt.month aapl['year'] = aapl['Date'].dt.year # plotting using plotnine ( ggplot(aapl, aes(x = 'Date', y = 'Adj Close')) + geom_point( aes(color = 'month') ) + geom_line() + geom_smooth() + theme_bw(base_size = 8) + theme(legend_position = 'bottom') ) ``` ] ] .right-plot[ ``` ## [*********************100%%**********************] 1 of 1 completed ``` ``` ## <Figure Size: (640 x 480)> ``` <img src="data:image/png;base64,#04_ts_eda_files/figure-html/aapl10p_out-1.png" width="350px" height="330px" style="display: block; margin: auto;" /> ] --- # Demo: `ts()` in
& `yfinance` in
? --- class: inverse, center, middle # A Summarizing Time-Series Data --- # Measures of Average **Mean:** Given a set of `\(n\)` values `\(Y_1, \, Y_2, \, \dots, \, Y_n\)`, the arithmetic mean can be computed as: `$$\bar{Y} = \frac{Y_1 + Y_2 + \dots + Y_n}{n} = \frac{1}{n}\sum_{i=1}^{i=n}Y_i.$$` <br> **Order Statistics:** Given a set of `\(n\)` values `\(Y_1, \, Y_2, \, \dots, \, Y_n\)`, we place them in an ascending order to define the order statistics, written as `\(Y_{(1)}, \, Y_{(2)}, \, \dots, \, Y_{(n)}.\)` **Median:** - If `\(n\)` is odd, `\(n = 2m + 1\)` and the median is `\(Y_{(m+1)}\)`. - If `\(n\)` is even, `\(n = 2m\)` and the median is the average of the two middle numbers, i.e., `\(\frac{1}{2}[Y_{(m)} + Y_{(m+1)}]\)`. --- # Measures of Variation The **range** denotes the difference between the largest and smallest value in a sample: `$$Range = Y_{(n)} - Y_{(1)}.$$` The **deviation** is defined as the difference between a given observation `\(Y_i\)` and the mean `\(\bar{Y}\)`. The **mean absolute deviation (MAD)** is the average deviations about the mean, irrespective of their sign: $$ \text{MAD} = \frac{\sum_{i=1}^{i=n}|d_i|}{n}. $$ The **variance** is the average of the squared deviations around the mean: $$ S^2 = \frac{\sum_{i=1}^{i=n}d_i^2}{n-1}. $$ --- # The GameStop Short Squeeze .center[
] .footnote[ <html> <hr> </html> **Note:** Click [Gamestop Short Squeeze](https://en.wikipedia.org/wiki/GameStop_short_squeeze) for more details. ] --- ## Summarizing the GME Short Squeeze: Avg/Var Measures .pull-left-2[ .font80[ ```r gme_get = tidyquant::tq_get(x = 'GME', from = '2019-06-01', to = '2020-03-16') |> dplyr::select(date, adjusted) |> dplyr::mutate( year = lubridate::year(date), month = lubridate::month(date, label = T) ) gme_get ``` ] ] .pull-right-2[ ``` ## # A tibble: 198 × 4 ## date adjusted year month ## <date> <dbl> <dbl> <ord> ## 1 2019-06-03 1.87 2019 Jun ## 2 2019-06-04 1.96 2019 Jun ## 3 2019-06-05 1.26 2019 Jun ## 4 2019-06-06 1.28 2019 Jun ## 5 2019-06-07 1.25 2019 Jun ## 6 2019-06-10 1.36 2019 Jun ## 7 2019-06-11 1.43 2019 Jun ## 8 2019-06-12 1.38 2019 Jun ## 9 2019-06-13 1.42 2019 Jun ## 10 2019-06-14 1.41 2019 Jun ## # ℹ 188 more rows ``` ] --- count: false ## Summarizing the GME Short Squeeze: Avg/Var Measures .pull-left-2[ .font80[ ```r gme_get = tidyquant::tq_get(x = 'GME', from = '2019-06-01', to = '2020-03-16') |> dplyr::select(date, symbol, adjusted) |> dplyr::mutate( year = lubridate::year(date), month = lubridate::month(date, label = T) ) *gme_summary = * gme_get |> * dplyr::group_by(symbol) gme_summary ``` ] ] .pull-right-2[ ``` ## # A tibble: 198 × 5 ## # Groups: symbol [1] ## date symbol adjusted year month ## <date> <chr> <dbl> <dbl> <ord> ## 1 2019-06-03 GME 1.87 2019 Jun ## 2 2019-06-04 GME 1.96 2019 Jun ## 3 2019-06-05 GME 1.26 2019 Jun ## 4 2019-06-06 GME 1.28 2019 Jun ## 5 2019-06-07 GME 1.25 2019 Jun ## 6 2019-06-10 GME 1.36 2019 Jun ## 7 2019-06-11 GME 1.43 2019 Jun ## 8 2019-06-12 GME 1.38 2019 Jun ## 9 2019-06-13 GME 1.42 2019 Jun ## 10 2019-06-14 GME 1.41 2019 Jun ## # ℹ 188 more rows ``` ] --- count: false ## Summarizing the GME Short Squeeze: Avg/Var Measures .pull-left-2[ .font80[ ```r gme_get = tidyquant::tq_get(x = 'GME', from = '2019-06-01', to = '2020-03-16') |> dplyr::select(date, symbol, adjusted) |> dplyr::mutate( year = lubridate::year(date), month = lubridate::month(date, label = T) ) gme_summary = gme_get |> dplyr::group_by(symbol) |> * dplyr::summarise( * ajusted_avg = mean(adjusted), * adjusted_med = median(adjusted), * adjusted_var = var(adjusted), * adjusted_sd = sd(adjusted) * ) gme_summary |> t() # transposing for printout ``` ] ] .pull-right-2[ ``` ## [,1] ## symbol "GME" ## ajusted_avg "1.240871" ## adjusted_med "1.2775" ## adjusted_var "0.0562454" ## adjusted_sd "0.2371611" ``` ] --- count: false ## Summarizing the GME Short Squeeze: Avg/Var Measures .pull-left-2[ .font80[ ```r gme_get = tidyquant::tq_get(x = 'GME', from = '2019-06-01', to = '2020-03-16') |> dplyr::select(date, symbol, adjusted) |> dplyr::mutate( year = lubridate::year(date), month = lubridate::month(date, label = T) ) gme_summary = gme_get |> * dplyr::group_by(symbol, year) |> dplyr::summarise( ajusted_avg = mean(adjusted), adjusted_med = median(adjusted), adjusted_var = var(adjusted), adjusted_sd = sd(adjusted) ) gme_summary ``` ] ] .pull-right-2[ ``` ## # A tibble: 2 × 6 ## # Groups: symbol [1] ## symbol year ajusted_avg adjusted_med adjusted_var adjusted_sd ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 GME 2019 1.29 1.36 0.0545 0.233 ## 2 GME 2020 1.09 1.03 0.0298 0.173 ``` ] --- count: false ## Summarizing the GME Short Squeeze: Avg/Var Measures .pull-left-2[ .font80[ ```r gme_get = tidyquant::tq_get(x = 'GME', from = '2019-06-01', to = '2020-03-16') |> dplyr::select(date, symbol, adjusted) |> dplyr::mutate( year = lubridate::year(date), month = lubridate::month(date, label = T) ) gme_summary = gme_get |> * dplyr::group_by(symbol, year, month) |> dplyr::summarise( ajusted_avg = mean(adjusted), adjusted_med = median(adjusted), adjusted_var = var(adjusted), adjusted_sd = sd(adjusted) ) print(gme_summary, n=15) ``` ] ] .pull-right-2[ ``` ## # A tibble: 10 × 7 ## # Groups: symbol, year [2] ## symbol year month ajusted_avg adjusted_med adjusted_var adjusted_sd ## <chr> <dbl> <ord> <dbl> <dbl> <dbl> <dbl> ## 1 GME 2019 Jun 1.42 1.38 0.0308 0.176 ## 2 GME 2019 Jul 1.16 1.13 0.0205 0.143 ## 3 GME 2019 Aug 0.917 0.926 0.00397 0.0630 ## 4 GME 2019 Sep 1.17 1.15 0.0136 0.117 ## 5 GME 2019 Oct 1.43 1.40 0.0116 0.108 ## 6 GME 2019 Nov 1.48 1.48 0.00428 0.0654 ## 7 GME 2019 Dec 1.49 1.51 0.00953 0.0976 ## 8 GME 2020 Jan 1.22 1.15 0.0321 0.179 ## 9 GME 2020 Feb 0.981 1.00 0.00410 0.0640 ## 10 GME 2020 Mar 0.994 0.985 0.00508 0.0713 ``` ] --- # A Demo in
--- class: inverse, center, middle # Correlation --- # The Pearson Correlation Coefficient - **Correlation:** measures the strength of the **linear relationship** between two quantitative variables. - It can be computed using the `cor()` from base R. Mathematically speaking, the pearson correlation coefficient, `\(r\)`, can be computed as `$$r = \frac{\sum_{i=1}^{n} (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^{n}(X_i - \bar{X})^2 \sum_{i=1}^{n}(Y_i - \bar{Y})^2}}$$` - Do **not** use the Pearson Correlation coefficient if both variables are not quantitative. Instead, refer to the `mixed.cor()` from the [psch package](https://personality-project.org/r/psych/help/mixed.cor.html) to compute the correlations for mixtures of continuous, polytomous, and/or dichotomous variables. - You should supplement **any descriptive summaries with visualizations** to ensure that you are able to interpret the computations correctly. --- ## Supplement Summaries with Viz: Anscombe's Dataset **In a seminal paper, Anscombe stated:** > **Few of us escape being indoctrinated with these notions:** > - numerical **calculations are exact, but graphs are rough**; > - for any particular kind of **statistical data there is just one set of calculations constituting a correct statistical analysis**; > - performing **intricate calculations is virtuous**, whereas **actually looking at the data is cheating**. He proceeded by stating that > a computer should **make both calculations and graphs**. Both sorts of output should be studied; each will contribute to understanding. Now, let us consider his four datasets, each consisting of eleven (x,y) pairs. .footnote[ <html> <hr> </html> **Source:** Anscombe, Francis J. 1973. "Graphs in Statistical Analysis." *The American Statistician* 27 (1): 17–21. ([PDF Link](https://www.sjsu.edu/faculty/gerstman/StatPrimer/anscombe1973.pdf)). --- count: false ## Supplement Summaries with Viz: Anscombe's Dataset .font80[
] --- count: false ## Supplement Summaries with Viz: Anscombe's Dataset .font80[
] --- count: false ## Supplement Summaries with Viz: Anscombe's Dataset <img src="data:image/png;base64,#04_ts_eda_files/figure-html/anscombe4-1.png" width="70%" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Recap --- # Summary of Main Points By now, you should be able to do the following: - Examine the goals of utilizing line charts in time-series analysis (i.e., detect trends, seasonality, and cycles). - Develop a deeper understanding of the grammar of graphics, which we used to create time series plots in
and
. - Use numerical summaries to describe a time series. - Explain what do we mean by correlation. --- # 📝 Review and Clarification 📝 1. **Class Notes**: Take some time to revisit your class notes for key insights and concepts. 2. **Zoom Recording**: The recording of today's class will be made available on Canvas approximately 3-4 hours after the session ends. 3. **Questions**: Please don't hesitate to ask for clarification on any topics discussed in class. It's crucial not to let questions accumulate. --- # 📖 Required Readings 📖 .pull-left[ .center[[<img src="https://d33wubrfki0l68.cloudfront.net/b88ef926a004b0fce72b2526b0b5c4413666a4cb/24a30/cover.png" height="350px">](https://r4ds.had.co.nz)] ] .pull-right[ * [Data Visualization](https://r4ds.had.co.nz/data-visualisation.html) * [Graphics for Communication](https://r4ds.had.co.nz/graphics-for-communication.html) * [Dates and Times](https://r4ds.had.co.nz/dates-and-times.html) ] --- # 🎯 Assignment 🎯 - Go over your notes and complete [Assignment 02](https://miamioh.instructure.com/courses/200821/quizzes/582798) on Canvas.